Dynamic Language Model Adaptation Using Latent Topical Information and Automatic Transcripts
نویسنده
چکیده
This paper investigates dynamic language model adaptation for Mandarin broadcast news recognition. A topical mixture model was presented to dynamically explore the long−span latent topical information for language model adaptation. The underlying characteristics and different kinds of model structures were extensively investigated, while their performance was verified by comparison with the conventional MAP−based adaptation approaches, which are devoted to extracting the short−span n−gram information. The fusion of global topical and local contextual information was investigated as well. The speech recognition experiments were conducted on the broadcast news collected in Taiwan. Both contemporary newswire texts and in−domain automatic transcripts were exploited in language model adaptation. Very promising results in perplexity as well as word error rate reductions were initially obtained.
منابع مشابه
Rapid Unsupervised Topic Adaptation – a Latent Semantic Approach
In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crosslingual settings. For automatic speech recognition we rapidly adapt a language model on a source l...
متن کاملIntra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk
This paper proposes an improved approach to extractive summarization of spoken multi-party interaction, in which integrated random walk is performed on a graph constructed on topical/ lexical relations. Each utterance is represented as a node of the graph, and the edges’ weights are computed from the topical similarity between the utterances, evaluated using probabilistic latent semantic analys...
متن کاملLanguage modeling and transcription of the TED corpus lectures
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and ...
متن کاملImperfect transcript driven speech recognition
In many cases, textual information can be associated with speech signals such as movie subtitles, theater scenarios, broadcast news summaries etc. This information could be considered as approximated transcripts and corresponds rarely to the exact word utterances. The goal of this work is to use this kind of information to improve the performance of an automatic speech recognition (ASR) system....
متن کاملDynamic Language Model Adaptation usin
We propose an unsupervised dynamic language model (LM) adaptation framework using long-distance latent topic mixtures. The framework employs the Latent Dirichlet Allocation model (LDA) which models the latent topics of a document collection in an unsupervised and Bayesian fashion. In the LDA model, each word is modeled as a mixture of latent topics. Varying topics within a context can be modele...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005